Demo Datasets

To demonstrate how Feasc were used to infer pathway activity, we prepared some public scRNA-seq datasets. Users could directly download the h5ad files.

Pbmc3k

import scanpy as sc
import numpy as np
import pandas as pd
import anndata as ad
import scipy
adata = sc.read_10x_mtx('data/pbmc3k/', var_names='gene_symbols', cache=True)
cell_anno = pd.read_csv("data/pbmc3k/pbmc3k_seurat_annotation.txt", sep="\t", index_col=0)
adata.obs = adata.obs.join(cell_anno)
adata.obs
seurat_annotations
AAACATACAACCAC-1 Memory_CD4_T
AAACATTGAGCTAC-1 B
AAACATTGATCAGC-1 Memory_CD4_T
AAACCGTGCTTCCG-1 CD14+_Mono
AAACCGTGTATGCG-1 NK
TTTCGAACTCTCAT-1 CD14+_Mono
TTTCTACTGAGGCA-1 B
TTTCTACTTCCTCG-1 B
TTTGCATGAGAGGC-1 B
TTTGCATGCCTCAC-1 Naive_CD4_T

2700 rows × 1 columns

adata.write_h5ad('data/h5ad/pbmc3k.h5ad')

IFN

we will load single-cell RNA sequencing data from the GSE96583 dataset, which contains immune cells under two conditions: with and without interferon (IFN) stimulation.

Mat = scipy.io.mmread("data/IFN/matrix.mtx")
X = Mat.T.toarray()
obs = pd.read_csv("data/IFN/GSE96583_batch2.total.tsne.df.tsv", sep="\t")
genef = pd.read_csv("data/IFN/GSE96583_batch2.genes.tsv", header=None, sep="\t")
var = pd.DataFrame(genef[1])
var = var.set_index(1).rename_axis('GeneSymbol')
adata = ad.AnnData(X, obs=obs, var=var, dtype='int32')
adata.obs = adata.obs.set_index('cell_id')
adata.var_names_make_unique()
adata.obs
tsne1 tsne2 ind stim cluster cell multiplets
cell_id
AAACATACAATGCC-1 -4.277833 -19.294709 107 ctrl 5 CD4 T cells doublet
AAACATACATTTCC-1 -27.640373 14.966629 1016 ctrl 9 CD14+ Monocytes singlet
AAACATACCAGAAA-1 -27.493646 28.924885 1256 ctrl 9 CD14+ Monocytes singlet
AAACATACCAGCTA-1 -28.132584 24.925484 1256 ctrl 9 CD14+ Monocytes doublet
AAACATACCATGCA-1 -10.468194 -5.984389 1488 ctrl 3 CD4 T cells singlet
TTTGCATGCTAAGC-1 25.142392 6.603815 107 stim 6 CD4 T cells singlet
TTTGCATGGGACGA-1 14.359657 10.965601 1488 stim 6 CD4 T cells singlet
TTTGCATGGTGAGG-1 27.317997 7.933458 1488 stim 6 CD4 T cells ambs
TTTGCATGGTTTGG-1 13.744084 9.347784 1244 stim 6 CD4 T cells ambs
TTTGCATGTCTTAC-1 14.572118 -4.713942 1016 stim 5 CD4 T cells singlet

29065 rows × 7 columns

adata.write_h5ad('data/h5ad/GSE96583_IFN.h5ad')

AML

We will also use the GSE154109 dataset, which contains scRNA-seq data from acute myeloid leukemia (AML) patients, to demonstrate how dimension reduction enhances cytokine activity inference. Cell type annotations were obtained from the TISCH2 database.

adata = sc.read_h5ad("data/AML_GSE154109/AML_GSE154109.h5ad")
obs = adata.obs
obs.index = obs.index.str.replace('@', '_')
obs
UMAP_1 UMAP_2 Cluster Celltype (malignancy) Celltype (major-lineage) Celltype (minor-lineage) Patient Sample Tissue
Cell
P1_AAATGCCAGACTAGAT-1 12.036471 -3.891038 14 Immune cells B B P1 AML1 Tumor
P1_AAGGTTCTCAACGGGA-1 11.871773 -4.192421 14 Immune cells B B P1 AML1 Tumor
P1_ACACCAAGTACCGGCT-1 11.403178 -3.530503 14 Immune cells B B P1 AML1 Tumor
P1_ACATCAGGTTTAAGCC-1 11.493506 -3.593977 14 Immune cells B B P1 AML1 Tumor
P1_ACTTGTTGTGGCGAAT-1 11.426346 -3.142435 14 Immune cells B B P1 AML1 Tumor
P8_TTAGGCAAGTACGATA-1 -3.451650 3.087100 12 Immune cells Mono/Macro cDC1 P8 AML8 Tumor
P8_TTCGGTCGTTCAGACT-1 -3.653885 2.907512 12 Immune cells Mono/Macro cDC1 P8 AML8 Tumor
P8_TTCTCAACAAGCGCTC-1 -3.589192 2.918538 12 Immune cells Mono/Macro cDC1 P8 AML8 Tumor
P8_TTCTCAAGTGTGACCC-1 -2.689308 3.016534 12 Immune cells Mono/Macro cDC1 P8 AML8 Tumor
P8_TTGACTTCATGGATGG-1 -2.064048 2.886846 12 Immune cells Mono/Macro cDC1 P8 AML8 Tumor

10799 rows × 9 columns